In order to build a spatial-temporal model to predict PM2.5 in Beijing, we collect the air quality data and meteorology data from 2017-01-31 to 2018-01-31 in several monitoring stations in Beijing. In addition, in order to analyze the Beijing air pollutants in the district scale, we also get the geographical boundaries data of all the districts in Beijing.
The plot below shows the 18 districts in Beijing, colored by their area and labeled by their English name. The small districts in the middle indicates the city center, while the larger ones along the city boarder are suburban areas.
The air quality data are collected from 35 different air quality stations across the city. The distribution of the air quality monitoring stations is not really uniform. More stations are crowded in the city center, while less are located in the larger suburban districts.
The air quality data includes 6 kinds of pollutants, \(PM2.5\), \(PM10\), \(NO_2\), \(CO\), \(O_3\) and \(SO_2\).
The meteorology data are collected from 18 different meteorology stations across the city. The distribution of the meteorology monitoring stations also concentrates in the city center, but more uniform than that of air quality stations.
The meteorology data includes 6 meteorology parameters, temperature, pressure, humidity, wind direction, wind speed and weather (categorical variable).
The distributions of the air quality and meteorology stations are shown in the following plot.
We convert the UTC time to Beijing time, create factor variables, date, month, weekdays, hour from Beijing time, and transform the obviously mistaken expressions (such as ‘999999’) to the missing value expressions, ‘NA’. (Code chunk hidden.)
We combine the districts data and air quality data, districts data and meteorology data in each monitoring stations respectively. We use longitude and latitude of the air quality and meteorology stations, and locate them in their corresponding districts. (Code chunk hidden.)
We first summarise all the obervations of air quality and meteorology data by districts respectively and then combine the air quality and meteorology data in the same districts. (Code chunk hidden.)
We first add missing Beijing time and make the dataset a complete dataset with no missing values in the time variable. Then we fill the missing value with its most recent previous observation respectively in each station. (Code chunk hidden.)
In order to analyze PM2.5 in the time series scale, we pick one of the most busy districts in Beijing, Haidian, where most of the univeristies in Beijing locate, as our target district.
We first filter the air quality and meteorology data and get all the observations in Haidian district. Then we summarise Haidian data by date and get daily observations of all the air pollutants and meteorology variables in Haidian district. (Code chunk hidden.)
The plots below shows the monthly, weekly, and daily trend of PM2.5 concentration.
In the following steps, we build ARIMA(1,1,0) model for Haidian hourly PM2.5 data and ARIMA(0,0,2) model for Haidian daily PM2.5 data respectively. The specific steps of models selections is shown in the code below, but will not be illustrated in detailed here.
The selection process, fitted plot, and rmse of ARIMA(1,1,0) model is shown below:
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## [1] "The RMSE of the ARIMA(1,1,0) model of the Haidian hourly PM2.5 data is 9.14351589257801"
The selection process, fitted plot, and rmse of ARIMA(0,0,2) model is shown below:
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## [1] "The RMSE of the ARIMA(0,0,2) model of the Haidian daily PM2.5 data is 34.7638292046742"
In the following steps, we build a Guassian Process model using the daily PM2.5 and time data only. Since the jags model is too long to run, we read the parameters, \(l\), \(\sigma^2\) and \(\sigma^2_w\) from the variogram plots directly. Below is the variogram plots:
We pick \(\sigma^2 = 500\), \(l = \sqrt{3}/5,\) and \(\sigma^2_w = 1300\) to fit the Gaussian Process. Below is the full posterior predictive distribution plot of Haidian daily PM2.5.
## [1] "The RMSE of the Gaussian Process model of the Haidian daily PM2.5 data is 54.293558327715"
From the above plot, we can see that the prediction of PM2.5 is clearly overfitted, which may result from inaccurate estimation of parameter \(l\).
In the following step, we first build a linear regression model based on other air pollutants and the weather conditions. The specific exploratory data analysis (EDA) and model selection procudures is shown in the codes below, but will not be illustrated in detailed here.
Then we use the residuals of the linear regression model to build a Gaussian Process model to add the time-related autoregression part of PM2.5 data.
Theoretically, the hierarchical model is a better choice. However, considering the computational cost and the capacity of our laptops, we have to abandon the Bayesian methods.
The following plots show the EDA part, and the results show the model selection procedures. We use the selected model as our final model to further predict its residual using Gaussian Process.
We start with the selected linear model:
##
## Call:
## lm(formula = PM2.5 ~ PM10 + NO2 + CO + O3 + pressure + humidity +
## wind_direction, data = haidian_date)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.187 -7.660 -0.821 7.127 66.540
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -662.70424 103.10985 -6.427 4.15e-10 ***
## PM10 0.45684 0.01490 30.655 < 2e-16 ***
## NO2 0.13479 0.06162 2.188 0.0293 *
## CO 32.12255 2.66877 12.036 < 2e-16 ***
## O3 0.11774 0.01813 6.496 2.77e-10 ***
## pressure 0.60117 0.09985 6.021 4.30e-09 ***
## humidity 0.40818 0.04436 9.201 < 2e-16 ***
## wind_direction 0.03698 0.01524 2.426 0.0158 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 358 degrees of freedom
## Multiple R-squared: 0.9059, Adjusted R-squared: 0.904
## F-statistic: 492.1 on 7 and 358 DF, p-value: < 2.2e-16
## [1] "The RMSE of the selected linear regression model of the Haidian daily PM2.5 data is 13.236979518667"
Then we use the residual of the selected linear model to conduct Gaussian Process. To start, below are the variogram plots that we use to decide parameters:
We set \(\sigma^2 = 50\), \(l = \sqrt{3}/5\) and \(\sigma^2_w = 130\), and fit Gaussian Process model on the linear regression residuals. The fitted values and the rmse are given below:
## [1] "The RMSE of the Linear Regression and Guassian Process model of the Haidian daily PM2.5 data is 4.50924367606426e-05"
From the above plot, we can see that the prediction of the linear regression residuals is clearly overfitted, which may result from inaccurate estimation of parameter \(l\).
If possible, the hierarchical model with enough number of iterations can provide more accurate estimation of Gaussian Process parameters. However, due to the limit of our laptop, we can only run 100 interations of jags in the following part, which has obviously not converged.
In the following steps, we run the hierarchical model for 100 iterations. The estimations and fitted values are given below. Obviously, the mcmc chain has not coverged.
| term | post_mean | post_med | post_lower | post_upper |
|---|---|---|---|---|
| beta[1] | 67.530 | 67.049 | 63.647 | 72.411 |
| beta[2] | -0.069 | -0.058 | -0.124 | -0.033 |
| beta[3] | 0.000 | 0.000 | 0.000 | 0.000 |
| l | 0.711 | 0.711 | 0.682 | 0.762 |
| sigma2 | 120.482 | 121.881 | 113.344 | 125.526 |
| sigma2_w | 113.004 | 112.619 | 107.960 | 118.317 |
In the following steps, we build a random forest model to predict the PM2.5 concentration. The detailed model selection procedures will not be illustrated.
The full model and selected model important variables, and rmse are show below:
## [1] "The RMSE of the random forest model of the Haidian hourly PM2.5 data is 8.05392607540167"
From the above result, we can see that the random forest model performs better than the linear model and the pure Gaussian Process model with only historical PM2.5 data, but is not as good as the combination of linear regression and Gaussian process model.
The RMSE of all the temporal models are summarised as follows:
| Model | RMSE |
|---|---|
| Hourly ARIMA | 9.1435159 |
| Daily ARIMA | 34.7638292 |
| Daily Gaussian Process | 54.2935583 |
| Daily lm Gaussian Process | 0.0000451 |
| Daily Random Forest | 8.0539261 |
The Bayesian Gaussian Process that did not converge is not included.
Among all the temporal models, the lm Gaussian Process model using air quality data has the lowest rmse and fitted our PM2.5 data the most. However, due to extreme low RMSE and its highly closed predicted & observed lines, we can say that overfitting problem does exist.
In order to build the spatial models, we pick 2017-10-03 as our target date, and try to fit spatial models (areal and point reference) for the PM2.5 concentration. The following plots show the PM2.5 concentration in each station points and the average PM2.5 of all the stations in each districts respectively, where the color illustrates the daily average PM2.5 concentration in \(\mu g/m^3\).
For areal models, we consider the districts that touch each other as neighbors. However, since the shapefile is not official and has some intersecting boundaries, we consider the districts that either touch or overlap as neighbors. Below shows the resulted adjacency plot, where the neighbors are connected by the red lines.
We first conduct Moran I test and Geary C test to see if there exists spatial autocorrelation:
moran.test(areal_aq$PM2.5, listw)
##
## Moran I test under randomisation
##
## data: areal_aq$PM2.5
## weights: listw
##
## Moran I statistic standard deviate = 3.3844, p-value = 0.0003567
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.37201655 -0.05882353 0.01620602
geary.test(areal_aq$PM2.5, listw)
##
## Geary C test under randomisation
##
## data: areal_aq$PM2.5
## weights: listw
##
## Geary C statistic standard deviate = 3.2466, p-value = 0.0005839
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic Expectation Variance
## 0.44762872 1.00000000 0.02894638
Both p-values are significant, so spatial autocorrelation exists. Since Moran I is positive and Geary C is smaller than one, we have positive spatial autocorrelation.
First we conduct CAR and SAR models using spautolm and only PM 2.5 concentrations.
mod_car = spautolm(formula = areal_aq$PM2.5~1, listw = listw, family = "CAR")
summary(mod_car)
##
## Call:
## spautolm(formula = areal_aq$PM2.5 ~ 1, listw = listw, family = "CAR")
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.37455 -1.75497 0.11642 1.84528 5.76747
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 16.0642 1.2673 12.676 < 2.2e-16
##
## Lambda: 0.15953 LR test value: 3.8619 p-value: 0.049395
## Numerical Hessian standard error of lambda: 0.038108
##
## Log likelihood: -45.06367
## ML residual variance (sigma squared): 7.8995, (sigma: 2.8106)
## Number of observations: 18
## Number of parameters estimated: 3
## AIC: 96.127
mod_sar = spautolm(formula = areal_aq$PM2.5~1, listw = listw, family = "SAR")
summary(mod_sar)
##
## Call:
## spautolm(formula = areal_aq$PM2.5 ~ 1, listw = listw, family = "SAR")
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.02786 -2.00524 -0.28217 1.57925 5.43650
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 16.6436 1.3825 12.039 < 2.2e-16
##
## Lambda: 0.13449 LR test value: 4.919 p-value: 0.026563
## Numerical Hessian standard error of lambda: 0.046385
##
## Log likelihood: -44.53512
## ML residual variance (sigma squared): 7.3201, (sigma: 2.7056)
## Number of observations: 18
## Number of parameters estimated: 3
## AIC: 95.07
Both are significant at 0.05 significance level, and the positive lambda’s indicate positive autocorrelation, which accords with our Moran I and Geary C test. The fitted values and residuals are shown below, and the two models provide similar results.
And we check that the spatial autocorrelation of the residuals are not significant:
moran.test(areal_aq$car_resid , listw)$p.value
## [1] 0.5804783
moran.test(areal_aq$sar_resid , listw)$p.value
## [1] 0.2927824
Then we move on to the Bayesian version of CAR model using stan, where the model and data used is specified in the code below, and 10000 iterations are conducted.
car_model = "
data {
int<lower=0> N;
vector[N] y;
matrix[N,N] W;
matrix[N,N] D;
}
parameters {
vector[N] w_s;
real beta;
real<lower=0> sigma2;
real<lower=0> sigma2_w;
real<lower=0,upper=1> phi;
}
transformed parameters {
vector[N] y_pred = beta + w_s;
}
model {
matrix[N,N] Sigma_inv = (D - phi * W) / sigma2;
w_s ~ multi_normal_prec(rep_vector(0,N), Sigma_inv);
beta ~ normal(0,10);
sigma2 ~ cauchy(0,5);
sigma2_w ~ cauchy(0,5);
y ~ normal(beta+w_s, sigma2_w);
}
"
data = list(
N = nrow(areal_aq),
y = areal_aq$PM2.5,
W = W * 1,
D = diag(rowSums(W))
)
The estimation, prediction and residuals are shown below:
| term | post_mean | post_med | post_lower | post_upper |
|---|---|---|---|---|
| beta | 15.458 | 15.620 | 11.672 | 18.206 |
| phi | 0.727 | 0.821 | 0.078 | 0.991 |
| sigma2 | 20.969 | 19.135 | 1.226 | 55.598 |
| sigma2_w | 1.646 | 1.382 | 0.170 | 3.994 |
The traceplots of beta and phi reasonably behaved, while the variances terms are relatively large. The range of the resiuals is obviously narrower than the previous CAR model. So the Stan CAR model fits the data better.
Similarly, below shows the model and data used for Bayesian version of SAR model using stan.
sar_model = "
data {
int<lower=0> N;
vector[N] y;
matrix[N,N] W_tilde;
matrix[N,N] D;
}
transformed data {
matrix[N,N] I = diag_matrix(rep_vector(1, N));
}
parameters {
vector[N] w_s;
real beta;
real<lower=0> sigma2;
real<lower=0> sigma2_w;
real<lower=0,upper=1> phi;
}
transformed parameters {
vector[N] y_pred = beta + w_s;
}
model {
matrix[N,N] C = I - phi * W_tilde;
matrix[N,N] Sigma_inv = C' * D * C / sigma2;
w_s ~ multi_normal_prec(rep_vector(0,N), Sigma_inv);
beta ~ normal(0,10);
sigma2 ~ cauchy(0,5);
sigma2_w ~ cauchy(0,5);
y ~ normal(beta + w_s, sigma2_w);
}
"
D = diag(rowSums(W))
D_inv = diag(1/diag(D))
data = list(
N = nrow(areal_aq),
y = areal_aq$PM2.5,
x = rep(1, nrow(areal_aq)),
D_inv = D_inv,
W_tilde = D_inv %*% W
)
The estimation, prediction and residuals are shown below:
| term | post_mean | post_med | post_lower | post_upper |
|---|---|---|---|---|
| beta | 12.250 | 14.517 | -2.336 | 19.985 |
| phi | 0.822 | 0.889 | 0.304 | 0.988 |
| sigma2 | 13.459 | 11.879 | 1.811 | 30.679 |
| sigma2_w | 1.842 | 1.772 | 0.747 | 3.497 |
The traceplots of SAR model converges somewhat better than the CAR one. Again, beta and phi seems reasonable while the sigma’s are quite large. Compared to the frequentist SAR, The range of the resiuals is obviously narrower. So the Stan SAR model fits the data better.
We first conduct backward selction on the linear regression of air quality variables. Variable PM10 and CO are found as the best predictors and are used in the spatial GLM. CAR model shown below is used for the spatial part and 500000 iterations are taken.
car_model = "model{
for(i in 1:length(y)) {
y[i] ~ dnorm(mu[i],inv.var)
y_pred[i] ~ dnorm(mu[i],inv.var)
mu[i] <- X[i,] %*% beta + omega[i]
}
for(i in 1:3) {
beta[i] ~ dnorm(0,1)
}
omega ~ dmnorm(rep(0,length(y)), tau * (D - phi*W))
sigma2 = 1/tau
tau ~ dgamma(2, 2)
phi ~ dunif(0,0.99)
inv.var ~ dgamma(0.01, 0.01)
}"
The estimation, prediction and residuals are shown below:
| term | post_mean | post_med | post_lower | post_upper |
|---|---|---|---|---|
| beta[1] | 0.733 | 0.752 | -1.279 | 2.677 |
| beta[2] | 1.107 | 1.118 | -0.294 | 2.301 |
| beta[3] | 0.581 | 0.591 | -0.697 | 1.758 |
| phi | 0.987 | 0.988 | 0.977 | 0.990 |
| sigma2 | 25.420 | 23.412 | 12.146 | 47.312 |
The traceplots looks fairly reasonable. Compared to the previous CAR and SAR models, The range of the resiuals is obviously narrower. However, we should consider overfitting problem for residuals this small.
Now we introduce the meterology data into our model. We first conduct backward selction on the linear regression of both air quality and meteorolgy variables. Variables pressure, wind_direction, PM10, NO2, and SO2 are found as the best predictors and are used in the spatial GLM. The same CAR model is used for the spatial part and 500000 iterations are taken.
The estimation, prediction and residuals are shown below:
| term | post_mean | post_med | post_lower | post_upper |
|---|---|---|---|---|
| beta[1] | 0.548 | 0.554 | -1.458 | 2.439 |
| beta[2] | -0.384 | -0.365 | -1.821 | 1.040 |
| beta[3] | -0.116 | -0.079 | -1.329 | 1.055 |
| beta[4] | 1.072 | 1.102 | -0.412 | 2.538 |
| beta[5] | 0.501 | 0.508 | -0.994 | 1.903 |
| beta[6] | -0.008 | -0.006 | -1.379 | 1.335 |
| phi | 0.985 | 0.987 | 0.967 | 0.990 |
| sigma2 | 26.283 | 22.898 | 9.956 | 63.843 |
Similar to previous model, the traceplots looks fairly reasonable. Compared to the previous CAR and SAR models, The range of the resiuals is obviously narrower. However, we should consider overfitting problem for residuals this small and for this many of predictors. Also, since we don’t have meteorology data for the central four districts, no fitted vlaues are provided.
We summarize the rmse and residual Moran I of each areal model in the kable below:
| Model | RMSE | Moran |
|---|---|---|
| CAR | 2.6610114 | -0.0846640 |
| SAR | 2.7055745 | 0.0105054 |
| Bayesian CAR | 1.0188925 | 0.2265644 |
| Bayesian SAR | 1.2142537 | -0.0939156 |
| aq lm CAR | 0.1527868 | -0.3314684 |
| aq_meo lm CAR | 0.1506335 | -0.3063892 |
The Spatial GLM model using air quality data has the lowest rmse and fitted our PM2.5 data the most, but we do suspect overfitting for rmse this small. The SAR model has the residual Moran I closest to 0, so it captured the spatial autocorrelation in the data best. In general, the Bayesian SAR model might be our best pick, since it’s rmse is reasonably small and residual Moran I close to 0.
Since we have 35 stations all over Beijing, it is possible for us to fit a point reference model and try to get a heat map for PM2.5 concentration across the city.
We first divide Beijing into \(150\times 200\) grids using raster, and fit a Thin Plate Splines model using fields::Tps with PM2.5 concentration only. The fitted plot is shown below, with the true PM2.5 data marked as points under the same color scale. We can see that the fitted values show obvious spatial trend, and the color of the true values matches the background in general.
To use Gaussian Process model, we need to check if the data is stationary and isotropic. According to the plot below, it seems reasonable for us to make these assumptions.
Then we move on to a Gaussian Process spatial model with exponential covariance function model using Jags. The model is specified below and only PM2.5 concentration is used. The pior of \(phi\) is set according to the variogram above.
gplm = "model{
for(i in 1:length(y)){
y[i] ~ dnorm(beta + w[i], tau)
y_pred[i] ~ dnorm(beta + w[i], tau)
mu_w[i] = 0
}
for(i in 1:length(y)){
for(j in 1:length(y)){
Sigma_w[i,j] = sigma2_w * exp(-phi * d[i,j])
}
}
w ~ dmnorm(mu_w, inverse(Sigma_w))
beta ~ dnorm(0, 1/1000)
sigma2_w ~ dgamma(2, 2)
sigma2 ~ dgamma(2, 2)
tau = 1/sigma2
phi ~ dunif(3/0.6, 3/0.05)
}"
The estimation and prediction are shown below. The fitted plot has true PM2.5 data marked as black points. We can see that less general trend is indicated, and the PM2.5 concentration at locations without air quality stations generally takes the average value.
| term | post_mean | post_med | post_lower | post_upper |
|---|---|---|---|---|
| beta | 15.281 | 15.295 | 13.734 | 16.665 |
| phi | 14.135 | 9.244 | 5.133 | 45.432 |
| sigma2 | 3.595 | 3.500 | 1.203 | 6.767 |
| sigma2_w | 4.517 | 4.431 | 1.702 | 7.863 |
Finally we repeat Gaussian Process spatial model with exponential covariance function model using spBayes. The model is specified below and only PM2.5 concentration is used.
n = nrow(OneDay_df)
n_samp = 100000
max_range = max(dist(coords)) / 4
starting = list(phi = 3/0.1, sigma.sq = 10, tau.sq = 1)
tuning = list("phi"=0.1, "sigma.sq"=0.1, "tau.sq"=0.1)
priors = list(
beta.Norm = list(0, 1000),
phi.Unif = c(3/(max_range), 3/(0.05)),
sigma.sq.IG = c(2, 2),
tau.sq.IG = c(2, 2)
)
mod_spBayes = spBayes::spLM(PM2.5 ~ 1, data = OneDay_df, coords = coords,
starting = starting, priors = priors,
cov.model = "exponential", n.samples = n_samp,
tuning = tuning,n.report = n_samp/2, verbose=FALSE)
mod_spBayes = spBayes::spRecover(mod_spBayes, start=n_samp/2+1, thin = (n_samp/2)/1000,
verbose=FALSE)
cat("Report interval Metrop. Acceptance rate:", mean(mod_spBayes$acceptance))
## Report interval Metrop. Acceptance rate: 62.422
The estimates and predictions are shown below:
The fitted map indicates overfitting, since the plot is even less smoothed than the Jags GP model, and for locations other than the air quality stations, the PM2.5 concentration is assigned average level with random noise.
| Model | RMSE |
|---|---|
| TPS | 1.857203 |
| JAGS GP | 1.666784 |
| spBayes GP | 0.000000 |
According to the rmse of areal data, the rmse of TPS and JAGS GP seems reasonably well. However, the extremly low rmse of spBayes GP suggests overfitting, which accords with our fitted map.
In order to build a spatial-temporal model, we use monthly average data of all the air pollutants in each air quality stations by averaging all the observations thoughout each month.
We first create the data frame of monthly observations of all the pollutants. (Code chunk hidden)
The following plots give us a rough idea about how the PM2.5 concentration varies in different month and different stations.
We use the important variables, \(PM10\), \(SO_2\) and \(CO\), that we get from the random forest model to build a dynamic spatial-temporal model of PM2.5 concentration. The model and inputs are shown in the code below.
# inputs
coords = dplyr::select(aq_df, longitude, latitude) %>% as.matrix()
max_range = max(dist(coords)) / 4
max_d = coords %>% dist() %>% max()
n_t = 13
n_s = nrow(aq_df)
n_beta = 4
starting = list(
beta = rep(0, n_t * n_beta), phi = rep(3/(max_d/4), n_t),
sigma.sq = rep(1, n_t), tau.sq = rep(1, n_t),
sigma.eta = diag(0.01, n_beta)
)
tuning = list(phi = rep(1, n_t))
priors = list(
beta.0.Norm = list(rep(0, n_beta), diag(1000, n_beta)),
phi.Unif = list(rep(3/(0.9 * max_d), n_t), rep(3/(0.1 * max_d), n_t)),
sigma.sq.IG = list(rep(2, n_t), rep(2, n_t)),
tau.sq.IG = list(rep(2, n_t), rep(2, n_t)),
sigma.eta.IW = list(2, diag(0.001, n_beta))
)
n_samples = 10000
models = lapply(paste0("PM2.5_", 0:12, "~PM10_", 0:12, "+SO2_", 0:12, "+CO_", 0:12), as.formula)
m = spBayes::spDynLM(
models, data = aq_df, coords = coords, get.fitted = TRUE,
starting = starting, tuning = tuning, priors = priors,
cov.model = "exponential", n.samples = n_samples, n.report = 1000, verbose=FALSE)
m = clean_spdynlm(m, n_samples/2+1, n_samples, (n_samples/2)/1000)
save(m, file = "sp_temp_mod.Rdata")
The following plots demonstrate the posterior inference of \(\beta\)’s, \(\theta\) and the predicted value. From the observed v.s. predited plot, we can see that, our estimation is not very accurate, which may be because of inappropriate start point or prior of \(\phi\).
Among all the temporal models, the lm Gaussian Process model using air quality data has the lowest rmse and fitted our PM2.5 data the most. However, due to extreme low RMSE and its highly closed predicted & observed lines, we can say that overfitting problem does exist.
Among areal spatial models, the Spatial GLM model using air quality data has the lowest rmse and fitted our PM2.5 data the most, but we do suspect overfitting for rmse this small. The SAR model has the residual Moran I closest to 0, so it captured the spatial autocorrelation in the data best. In general, the Bayesian SAR model might be our best pick, since it’s rmse is reasonably small and residual Moran I close to 0.
Among point reference models, the TPS model best shows the general spatial trend of PM2.5 concentration in Beijing. While the extremly low rmse of spBayes GP suggests overfitting, which accords with our fitted map.
For the spatial-temporal model, the inappropriate start point or prior of \(\phi\) leads to the large \(\phi\) estimation, and the resulted prediction is unsatisfying. In the future, we may try exploring better starting point for prior of \(\phi\).